The open lexical infrastructure of Spräkbanken

نویسندگان

  • Lars Borin
  • Markus Forsberg
  • Leif-Jöran Olsson
  • Jonatan Uppström
چکیده

We present our ongoing work on Karp, Språkbanken’s (the Swedish Language Bank) open lexical infrastructure, which has two main functions: (1) to support the work on creating, curating, and integrating our various lexical resources; and (2) to publish daily versions of the resources, making them searchable and downloadable. An important requirement on the lexical infrastructure is also that we maintain a strong bidirectional connection to our corpus infrastructure. At the heart of the infrastructure is the SweFN++ project with the goal to create free Swedish lexical resources geared towards language technology applications. The infrastructure currently hosts 15 Swedish lexical resources, including historical ones, some of which have been created from scratch using existing free resources, both external and in-house. The resources are integrated through links to a pivot lexical resource, SALDO, a large morphological and lexical-semantic resource for modern Swedish. SALDO has been selected as the pivot partly because of its size and quality, but also because its form and sense units have been assigned persistent identifiers (PIDs) to which the lexical information in other lexical resources and in corpora are linked.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Korp - the corpus infrastructure of Spräkbanken

We present Korp, the corpus infrastructure of Språkbanken (the Swedish Language Bank). The infrastructure consists of three main components: the Korp corpus pipeline, the Korp backend, and the Korp frontend. The Korp corpus pipeline is used for importing corpora, annotating them, and then exporting the annotated corpora into different formats. An essential feature of the pipeline is the ability...

متن کامل

Applying Kodas Technique to Measure Urban Infrastructure in Metropolises of Iran

Aims: Due to the rapid expansion of urban areas, the lack of urban infrastructure in the country's metropolises is strongly felt. This infrastructure, in parallel with urban development, is vital for improving the quality of life in the country's metropolises. The present study was conducted using the CODAS multi-indicator technique with the aim of analyzing indicators related to urban infrastr...

متن کامل

Open Source Lexical Information Network

Currently, there is a large number of lexical resources available: GENELEX, PAROLE, EuroWordNet and its follow-ups like GermaNet, MultiWordNet, etc. With this multitude of resources, the need arises for standardisation, in the guise of for instance the EAGLES, ISLE/ MILE, EMELD, and TC37/SC4 projects. A current attempt in these standardisation framework is the conception of a network of lexical...

متن کامل

Papillon Lexical Database Project: Monolingual Dictionaries & Interlingual Links

This paper presents a new research and development project called Papillon. It started as a French-Japanese cooperation between laboratories GETA/CLIPS (Grenoble, France) and NII (Tokyo, Japan). Its goal is to build a multilingual lexical database and to extract from it digital bilingual dictionaries. The database is built with monolingual dictionaries, one for each language of the database, li...

متن کامل

The Effect of Lexical Collocational Density on the Iranian EFL Learners’ Reading Comprehension

The present study aims at investigating the effect of different levels of lexical collocational density on EFL learners’ reading comprehension. Eighty sophomore students with different levels of proficiency studying at  Zand Institute of Higher Education in Shiraz, Iran were chosen from among eighty five learners based on their score distribution on a reduced TOEFL test constructed by Education...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012